Introduction
In this paper, we investigate whether there is a relationship between a nation’s annual oil consumption per person and its Sustainable Development Index (SDI). A nation’s SDI is calculated by taking a score based on average life expectancy, income, and education level, and dividing it by a score based on nationwide carbon footprint. We wanted to determine if nations have found ways over time to maintain their own Sustainable Development Index while also consuming oil, or if oil use has curbed in favor of sustainable development. To do this, we analyzed historical data from Gapminder for both annual oil consumption per person and SDI.
Data Table
Data Visualization
Visualization Over Time
Linear Regression
\[\widehat{SDI} = 67.32 - 7.714 \times (Oil Consumption)\]
The above equation was determined by creating a linear model in R and then interpreting its result. The intercept, 67.32, is the average Sustainable Development Index (y = SDI) for those countries where the average Oil Consumption per Person is 0. It has no practical interpretation here, since observing an SDI of 0 is impossible. The slope for Oil Consumption per Person, -7.714, summarizes the relationship between the SDI and Oil Consumption per Person variables. The sign is negative, suggesting a negative relationship between these two variables, meaning countries with higher average Oil Consumption per Person tend to have lower Sustainable Development Indexes. For every increase of 1 unit in Oil Consumption per Person, there is an associated decrease of, on average, 7.714 units of SDI.
Analyzing Model Residuals
| Response Variance | Fitted Value Variance | Residual Variance | Proportion of Variance Accounted for in Model |
|---|---|---|---|
| 374.7564 | 150.5912 | 224.1652 | 0.2009188 |
Discussion
The variance in the response variable, 374.7564, tells us how much Countries differ in Sustainable Development Index. The variance in the fitted values, 150.5912, from our regression model tells us how much the fitted values from our linear regression model vary. The variance of the residuals, 224.1652, tells us how much “the left-overs” from the model vary. The proportion of the variability in the response values that was accounted for by our regression model is 0.2009188; thus, our linear regression model explains about 20% of the variation in Sustainable Development Index. This suggests that our model does not tell us everything we need to know about Sustainable Development Index.
Visualizing Simulations From Our Model
The above plots compare the simulated data from our linear model with the actual data from Gapminder. Our model seems to generate values with a higher SDI when oil consumption per person is low than compared to the original data. The range of oil consumption is also smaller in the generated dataset. It also seems to be scattered a lot more than the observed data. The R-squared value of the simulated data is around 17% which is less than the 20% for the observed data. Thus, our linear model explains less simulated SDI variance compared to the observed SDI variance.
Generating Multiple Predictive Checks
The above plot shows the distribution of R-squared values when creating linear models with our generated data. We generated 1000 models and plotted the R-squared values for each. In our initial runthrough, we found that the simulated datasets had R-squared values between 0.12 and 0.185. This indicates that the data simulated under our statistical model are somewhat similar to what was observed. On average, our simulated data account for at most 18.5% of the variability in the observed Sustainable Development Indexes. Note that the above plot was generated at build time and reflects a different, randomly generated dataset.
Conclusion
While there seems to be a negative relationship between SDI and Oil Consumption, we do not have enough data to determine that it is a purely linear relationship. Our model doesn’t account for enough of the variation within the observed dataset. For future examination, we would either try different regression techniques and/or seek out more data to train our model with.